{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 在千帆 Python SDK 使用 OpenCompass 提供的功能\n",
    "\n",
    "OpenCompass是由上海人工智能实验室开源的大模型评测平台。它涵盖了学科、语言、知识、理解、推理等五大评测维度，能够全面评估大模型的能力。OpenCompass作为一个评测工具，对于研究和开发大模型的人员来说，是非常有价值的资源。通过使用OpenCompass，用户可以更准确地了解他们的大模型在各项任务上的表现，从而进行针对性的优化和改进。\n",
    "\n",
    "千帆 Python SDK 中内置的评估模块，支持用户使用 OpenCompass 提供的评估器和 `TEXT_POSTPROCESSORS` 对象，对模型的推理结果进行评估前的预处理，以及评估。\n",
    "\n",
    "# 前置准备\n",
    "\n",
    "首先，需要安装千帆 Python SDK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "pip install -U qianfan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后再安装 OpenCompass。这部分的教程可以参考 OpenCompass 所提供的[官方文档](https://opencompass.org.cn/doc)，或者直接使用以下命令在 Python 中安装："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "!git clone https://github.com/open-compass/opencompass opencompass\n",
    "!cd opencompass\n",
    "!pip install -e ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 正文\n",
    "\n",
    "为了在千帆 Python SDK 的评估模块中使用来自 OpenCompass 的评估器，用户需要使用 `qianfan.evaluation.opencompass_evaluator` 中提供的 `OpenCompassLocalEvaluator` 类，将 OpenCompass 评估器包装为千帆 Python SDK 的评估器。\n",
    "\n",
    "OpenCompass 所有可以使用的评估器都存放在 `opencompass.openicl.icl_evaluator` 模块下。其中，只有那些 `score` 函数仅包含 `predictions` 和 `references` 两个参数的评估器，才能用于千帆 Python SDK 提供的 `OpenCompassLocalEvaluator` 类。这部分符合条件的评估器包括：\n",
    "\n",
    "+ opencompass.openicl.icl_evaluator.icl_agent_evaluator.PassRateEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_circular_evaluator.CircularEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_em_evaluator.EMEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.AccEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.RougeEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.BleuEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.BleuFloresEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.MccEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_hf_evaluator.SquadEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_jieba_rouge_evaluator.JiebaRougeEvaluator\n",
    "+ opencompass.openicl.icl_evaluator.icl_toxic_evaluator.ToxicEvaluator\n",
    "\n",
    "在本教程编写时，用户应可以直接使用上面列表所列的评估器。\n",
    "\n",
    "而为了使用来自 OpenCompass 的 `TEXT_POSTPROCESSORS` 模块，用户应该首先对其进行一层封装：大多数 `TEXT_POSTPROCESSORS` 函数都只支持传入一个 `str` 参数作为入参。我们需要对 `TEXT_POSTPROCESSORS` 模块进行包装，使其能够额外接收 `**kwargs` 作为入参。\n",
    "\n",
    "例如，手动编写一个 `wrapper` 函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Callable\n",
    "\n",
    "def open_compass_text_post_processor_wrapper(fn: Callable) -> Callable:\n",
    "    def wrapper(*args, **kwargs) -> str:\n",
    "        # 由于大部分的文本处理函数都只接收一个 str 参数作为入参，\n",
    "        # 因此我们需要手动承接一下外部额外的关键字参数\n",
    "        return fn(*args)\n",
    "\n",
    "    return wrapper"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用户可以根据自身的需求，调整实际传入给 `TEXT_POSTPROCESSORS` 参数的入参。例如，有些 `TEXT_POSTPROCESSORS` 参数需要传入 `option` 参数。\n",
    "\n",
    "下面我们演示如何使用 OpenCompass 所提供的 Xsum 评估集和配套工具，结合千帆 Python SDK 对自训练的模型或预置模型进行评估\n",
    "\n",
    "## 导入数据集\n",
    "\n",
    "在评估之前，我们需要先导入数据集。本教程准备了一份专门用于评估的 Xsum 数据集，存放在 `data_file/opcp.jsonl` 中，用户也可以选择手动下载对应数据集进行替换"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2024-07-03 15:15:01.677] dataset.py:382 [t:8266636800]: construct dataset from huggingface dataset\n",
      "INFO:qianfan:construct dataset from huggingface dataset\n",
      "[INFO][2024-07-03 15:15:01.680] dataset.py:399 [t:8266636800]: construct from huggingface Dataset\n",
      "INFO:qianfan:construct from huggingface Dataset\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'dialogue': 'The ex-Reading defender denied fraudulent trading charges relating to the Sodje Sports Foundation - a charity to raise money for Nigerian sport.\\nMr Sodje, 37, is jointly charged with elder brothers Efe, 44, Bright, 50 and Stephen, 42.\\nAppearing at the Old Bailey earlier, all four denied the offence.\\nThe charge relates to offences which allegedly took place between 2008 and 2014.\\nSam, from Kent, Efe and Bright, of Greater Manchester, and Stephen, from Bexley, are due to stand trial in July.\\nThey were all released on bail.', 'summary': 'Former Premier League footballer Sam Sodje has appeared in court alongside three brothers accused of charity fraud.'}\n"
     ]
    }
   ],
   "source": [
    "from opencompass.datasets import xsum\n",
    "from qianfan.dataset import Dataset\n",
    "\n",
    "ds = xsum.XsumDataset.load(\"data_file/opcp.jsonl\")\n",
    "\n",
    "ds1 = Dataset.load(huggingface_dataset=ds)    \n",
    "ds1.input_columns = [\"dialogue\"]\n",
    "ds1.reference_column = \"summary\"\n",
    "\n",
    "print(ds1.list(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 创建评估器及评估器对象\n",
    "\n",
    "然后，我们创建评估器对象。\n",
    "\n",
    "这里使用了由 [Xsum 数据集配置](https://github.com/open-compass/opencompass/blob/fc2c9dea8ca41bebf1a1667881657b16bf2b0791/configs/datasets/Xsum/Xsum_gen_31397e.py) 所指定的 `RougeEvaluator` 和 [Xsum 数据集描述](https://github.com/open-compass/opencompass/blob/main/opencompass/datasets/xsum.py) 提供的后处理器 `Xsum_postprocess`，分别用于数据集的评估和拿到推理结果后、评估开始前的处理环节"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from opencompass.openicl.icl_evaluator import RougeEvaluator\n",
    "from qianfan.evaluation import EvaluationManager\n",
    "from qianfan.evaluation.opencompass_evaluator import OpenCompassLocalEvaluator\n",
    "\n",
    "evaluator_list = [\n",
    "    OpenCompassLocalEvaluator(open_compass_evaluator=RougeEvaluator()),\n",
    "]\n",
    "\n",
    "pre_processors = [\n",
    "    open_compass_text_post_processor_wrapper(xsum.Xsum_postprocess)\n",
    "]\n",
    "\n",
    "em = EvaluationManager(local_evaluators=evaluator_list, pre_processors=pre_processors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 开始评估以及保存结果\n",
    "\n",
    "我们只需要调用 `EvaluationManager` 的 `eval` 函数，即可完成一次评估，并且将评估结果保存到本地的文件中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[WARNING][2024-07-03 15:15:07.613] model.py:472 [t:8266636800]: service type should be specified before exec\n",
      "WARNING:qianfan:service type should be specified before exec\n",
      "[INFO][2024-07-03 15:15:07.615] evaluation_manager.py:497 [t:8266636800]: start to inference in batch during evaluation\n",
      "INFO:qianfan:start to inference in batch during evaluation\n",
      "[INFO][2024-07-03 15:15:07.616] dataset_utils.py:455 [t:11254542336]: prompt template detected, start to check template variables\n",
      "INFO:qianfan:prompt template detected, start to check template variables\n",
      "[INFO][2024-07-03 15:15:09.633] base.py:102 [t:11271368704]: All tasks finished, exeutor will be shutdown\n",
      "INFO:qianfan:All tasks finished, exeutor will be shutdown\n",
      "[INFO][2024-07-03 15:15:09.635] evaluation_manager.py:521 [t:8266636800]: start to evaluate llm 0\n",
      "INFO:qianfan:start to evaluate llm 0\n",
      "[INFO][2024-07-03 15:15:10.763] evaluation_manager.py:549 [t:8266636800]: start to merge evaluation result dataset\n",
      "INFO:qianfan:start to merge evaluation result dataset\n",
      "[INFO][2024-07-03 15:15:10.765] dataset.py:481 [t:8266636800]: no destination data source was provided, construct\n",
      "INFO:qianfan:no destination data source was provided, construct\n",
      "[INFO][2024-07-03 15:15:10.765] dataset.py:276 [t:8266636800]: construct a file data source from path: eval_result.json, with args: {}\n",
      "INFO:qianfan:construct a file data source from path: eval_result.json, with args: {}\n",
      "[INFO][2024-07-03 15:15:10.765] file.py:291 [t:8266636800]: use format type FormatType.Json\n",
      "INFO:qianfan:use format type FormatType.Json\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "inputs: {'predictions': ['Four people, including a former Reading defender, have denied fraudulent trading charges relating to a charity set up to raise money for Nigerian sport. The case is due to stand trial in July.'], 'references': ['Former Premier League footballer Sam Sodje has appeared in court alongside three brothers accused of charity fraud.']}\n",
      "{'rouge1': Score(precision=0.09090909090909091, recall=0.17647058823529413, fmeasure=0.12000000000000001), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=0.06060606060606061, recall=0.11764705882352941, fmeasure=0.08), 'rougeLsum': Score(precision=0.06060606060606061, recall=0.11764705882352941, fmeasure=0.08)}\n",
      "{'rouge1': Score(precision=0.09090909090909091, recall=0.17647058823529413, fmeasure=0.12000000000000001), 'rouge2': Score(precision=0.0, recall=0.0, fmeasure=0.0), 'rougeL': Score(precision=0.06060606060606061, recall=0.11764705882352941, fmeasure=0.08), 'rougeLsum': Score(precision=0.06060606060606061, recall=0.11764705882352941, fmeasure=0.08)}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<qianfan.dataset.dataset.Dataset at 0x1738e33a0>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qianfan.common import Prompt\n",
    "from qianfan.model import Service\n",
    "\n",
    "# 指定请求时的 Prompt 模板，对齐 Xsum 的推理部分\n",
    "template='Document：{dialogue}\\n''Based on the previous text, provide a brief single summary:'\n",
    "prompt = Prompt(template)\n",
    "\n",
    "# 使用 ERNIE-Speed-128K 进行推理\n",
    "eb_turbo_service = Service(endpoint=\"ernie-speed-128k\")\n",
    "result_ds = em.eval([eb_turbo_service], dataset = ds1, prompt_template = prompt).result_dataset\n",
    "\n",
    "# 保存推理结果到本地文件\n",
    "result_ds.save(data_file=\"eval_result.json\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "bce-qianfan-sdk-new",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.1.-1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
